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[57] ABSTRACT 

A system and method provides universal access to voice - 
based documents containing information formatted using 
MIME and HTML standards using customized extensions 
for voice information access and navigation. These voice 
documents are linked using HTML hyper-links that are 
accessible to subscribers using voice commands, touch-tone 
inputs and other selection means. These voice documents 
and components in them are addressable using HTML 
anchors embedding HTML universal resource locators 
(URLs) rendering them universally accessible over the Inter- 
net. This collection of connected documents forms a voice 
web. The voice web includes subscriber-specific documents 
including speech training files for speaker dependent speech 
recognition, voice print files for authenticating the identity 
of a user and personal preference and attribute files for 
customizing other aspects of the system in accordance with 
a specific subscriber. 

11 Claims, 12 Drawing Sheets 



200 Voice Wtb Sttvict* 









Voice Web 
Service 
Agents 




Server 
Software 


« 



202^ 



Persona] 
Profile 



Service 




Service forms 


^203 


Database 




and pages 





^| X Interne! ]^^. 



105 

/ 



Voice Web Gateway 



Voice Web 
Browser 




06/30/2003, EAST Version: 1.04.0000 



5,915,001 

Page 2 



U.S. PATENT DOCUMENTS 



5,278,942 1/1994 Bahl et al 395/2 

5,293,452 3/1994 Picone et al 395/2.59 

5,297,183 3/1994 Bareis et al 379/59 

5,297,194 3/1994 Hunt et al 379/88 

5,325,421 6/1994 Hou et al 379/67 

5,335,276 8/1994 Thompson et al 380/21 

5,335,313 8/1994 Douglas 395/2.84 

5,343,529 8/1994 Goldfine et al 380/23 

5,355,433 10/1994 Yasuda et al 395/2.52 

5,359,508 10/1994 Rossides 364/401 

5,365,574 11/1994 Hunt et al 379/88 

5,388,213 2/1995 Oppenheimer et al 395/200 

5,390,278 2/1995 Gupta et al 395/2.52 

5,410,698 4/1995 Danneels et al 395/650 

5,430,827 7/1995 Rissanen 395/2.82 

5,448,625 9/1995 Lederman 379/67 

5.452.340 9/1995 Engelbeck et al 379/67 

5.452.341 9/1995 Sattar 379/88 

5,452,397 9/1995 Ittycheriah et al 395/2.49 

5,454,030 9/1995 de OKveira et al 379/100 

5,463,715 10/1995 Gagnon 395/2.76 

5,465,290 11/1995 Hampton et al 379/67 

5,479,491 12/1995 Herrero Garcia et al 379/88 



5,479,510 12/1995 Olsen et al 380/24 

5,483,580 1/1996 Brandman et al 379/88 

5,485,370 1/1996 Moss et al 364/408 

5,486,686 1/1996 Zdybel, Jr. et al 235/375 

5,487,671 1/1996 Shpiro et al 434/185 

5,490,251 2/1996 Clark et al 395/200.2 

5,499,288 3/1996 Hunt et al 379/88 

5,510,777 4/1996 Pile et al 340/825 

5,513,272 4/1996 Bogosian, Jr. 382/116 

5,517,605 5/1996 Wolf 395/155 

5,526,620 6/1996 Krause 395/600 

5,530,852 6/1996 Meske, Jr. et al 395/600 

5,533,115 7/1996 Hollenbach et al 379/220 

5,534,855 7/1996 Shockley et al 340/825.3 

5,537,586 7/1996 Amram et al 395/600 

5^542,046 7/1996 Carlson et al 395/186 

5,544,255 8/1996 Smithies et al 382/119 

5,544,322 8/1996 Cheng et al 395/200.12 

5,548,726 8/1996 Pettus 395/200.09 

5,550,976 8/1996 Henderson et al 395/200.06 

5,551,021 8/1996 Harada et al 395/600 

5,608,786 3/1997 Gordon 379/100 

5,613,012 3/1997 Hoffman et al 382/115 



06/30/2003, EAST Version: 1.04.0000 



U.S. Patent 



Jun. 22, 1999 



Sheet 1 of 12 



5,915 




06/30/2003, EAST Version: 1.04.0000 



U.S. Patent 



Jun. 22, 1999 Sheet 2 of 12 



5,915,001 




06/30/2003, EAST Version: 1.04.0000 



U.S. Patent Jun. 22, 1999 Sheet 3 of 12 



5,915,001 




06/30/2003, EAST Version: 1.04.0000 



U.S. Patent jun.22,1999 sheet 4 of 12 5,915,001 



Web Site 224 





201 




Service 


* 


Server 


Agents 




software 






112^ 



Service 
database 



202 



203 



Service forms 
and pages 



Figure 2C 



Gateway and Web Site 220 



Service 
Agents 



201 



Server 
software 



112^ 



HTTP 



Service 
database 



HTTP 



112 



215 



Server 




Profile 


software 






database 





Profile 
Service 
Agent 




Voice web / 
browser 



106 




Subscriber 



216 



222 



Service forms 
and pages 



Web Site 225 



06/30/2003, EAST Version: 1.04.0000 



U.S. Patent 



Jun. 22, 1999 



Sheet 5 of 12 



5,915,001 




U.S. Patent 



Jun. 22, 1999 



Sheet 6 of 12 



5,915,001 



2 

CD 



ft 

•a 

•mm 

2 

H 



© 

■5 
& 

4> 




06/30/2003, EAST Version: 1.04.0000 



U.S. Patent 



Jun. 22, 1999 Sheet 7 of 12 



5,915,001 




U.S. Patent Jun. 22, 1999 Sheet 8 of 12 5,915,001 



600 



Subscriber Initiates access 



I 



601 



Login Processing —602 



Present Home Page 



603 



Wait until Subscriber 
submits form 



Signature Stored 



04 



Process Submitted Form 



606 



605 



Subscriber Supplies 
Voice signature and 
submits form 



Test for a 
Siganture Match 



611 



Signature Not Stored 



Match / \No Match 

7 \ — > 

C~ K \ Access 

[ Access J Denied 

612/ 



Present Signature 
Creation Form 



613 



T 



607 



609 
^ 



Wait until Subscriber submits Form 



610 



60S 





Subscriber records 




form and then 




submits 


< 





Figure 6 



06/30/2003, EAST Version: 1.04.0000 



U.S. Patent Jun. 22, 1999 Sheet 9 of 12 



5,915 



700 



720 



1 



Subscriber 
inputs voice 
commands 



Subscriber accesses the 
personal voice web 



701 



Present Home Page 



.702 



Determine selected 
service 



03 



Invoke Service 
(Load service page and 
load speech training page) 



704 



I 



Deliver Service 



i 



705 



Exit service 



706 



Figure 7 



06/30/2003, EAST Version: 1.04.0000 



U.S. Patent Jun. 22, 1999 Sheet 10 of 12 5,915,001 



800 



Subscriber accesses the 
personal voice web 



Present Home Page 



Invoke service 



Subscriber 
inputs voice 
commands 



801 



.802 



.803 



Deliver Service 
(Customize query forms using 
user attributes and preferences) 



/ 



804 



Figure 8 



06/30/2003, EAST Version: 1.04.0000 



U.S. Patent 



Jun. 22, 1999 Sheet 11 of 12 



5,915,001 



Voice Web Form Publishing 



900 



Present publishing form to 
caller/publisher 



I 



901 



902 



Process caller input 
(voice input and touch 
tone input) 



903 



Caller supplies 
input and then 
submits the Form 



Store input 



904 



Figure 9 



06/30/2003, EAST Version: 1.04.0000 



U.S. Patent Jun. 22, 1999 Sheet 12 of 12 5,915,001 



to 

CO 

3 
s 

cu 

u 

o 

s 

i 



0) 

u 

o 
> 












/hite 
gent 










>. *m CO 



9 

bo 



06/30/2003, EAST Version: 1.04.0000 



5,9 

1 

SYSTEM AND METHOD FOR PROVIDING 
AND USING UNIVERSALLY ACCESSIBLE 
VOICE AND SPEECH DATA FILES 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

This invention relates generally to the construction and 
use of distributed interactive voice and speech processing 
systems, including interactive voice response (IVR) systems 
and voice messaging (VM) systems. More particularly, the 
invention relates to form based publishing of voice infor- 
mation and the use of universally accessible personal pro- 
files for authentication of the user by voice signatures and 
generating context sensitive active vocabularies to improve 
speaker dependent speech recognition. The invention also 
relates to the use of the user attributes and preferences stored 
in universally accessible personal profiles to improve the 
efficiency of navigation and search as well as efficacy of 
search results pertaining to user queries. 

2. Description of the Related Art 

Conventional interactive voice response (IVR) systems 
allow a user to place a telephone call into a system, navigate 
(generally using touch tone input) through a hierarchy of 
options in response to voice prompts and retrieve informa- 
tion stored in a computer database. Airlines, banks, credit 
companies and many other service organizations are just a 
few examples of the types of businesses using IVR systems 
to allow a customer (or prospective customer) to retrieve 
desired information. These conventional systems are gener- 
ally organization-specific in that they offer access to a single 
database or set of databases related to the goods, services or 
other aspects of the organization maintaining the IVR sys- 
tem. Thus, conventional IVR technology is used to offer 
access to information specific to a single organization (i.e. a 
specific airline, bank or credit company). For example 
airlines typically use IVR to allow callers to access flight 
arrival and departure information or to select reservation 
options, for the particular airline only. 

It is desirable to provide an IVR system that enables 
access to an aggregation of databases and services rather 
than a single database and service. One barrier to the 
provision of aggregated services in an IVR system is that 
conventional IVR systems do not have a distributed infor- 
mation publishing means. Conventional IVR systems do not 
have a mechanism for service/information providers to 
readily access the IVR system and add updated or entirely 
new information for publication on the IVR system. 

Further, conventional IVR systems are generally config- 
ured for uniform access by any caller admitted to the IVR 
system. Each caller is handled by the system in the same 
manner and offered an identical set of options. One reason 
that IVR systems use uniform user interfaces for each caller 
rather than caller-specific configurations is that conventional 
IVR systems operate in "closed" computer environments 
hosting the particular IVR system. Thus, when a caller 
accesses a conventional IVR system, the only caller-specific 
information which the system has at its disposal, is any 
information previously provided by the caller which the 
system has maintained or any information that is provided 
by the caller during the IVR session (i.e. when a user enters 
an account number using touch tone telephone input). 
Because, however, collecting and storing caller-specific 
information with conventional technology is cumbersome 
and time consuming, most IVR systems do not offer caller- 
specific (caller customized) features. 

There are numerous applications in which it is desirable 
for an IVR system to use caller-specific information in 
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handling a call. Caller-specific information in the form of 
user preferences can aid in minimizing the size of a com- 
mand tree which the user must navigate to access desired 
information. Additionally, caller specific information could 

5 also be used to authenticate the identity of a user in cases 
where security is an issue (i.e. in bank and credit contexts). 
Further, caller-specific speech training profiles could be used 
to implement speaker dependent speech recognition to allow 
for a caller to use voice commands in place of touch-tone 

10 commands. Still further, an IVR system having access to 
caller-specific data could be used to apply IVR technology 
in new application areas such as personal productivity. 

Thus, there is a need for an improved voice and speech 
processing system that provides universal access to caller- 

15 specific information to provide user-customized IVR sys- 
tems. Further, there is a need to provide universal access to 
voice and speech files in order to allow widespread use of 
such files for caller authentication and for performing 
speaker dependent speech recognition in IVR systems. 

20 SUMMARY OF THE INVENTION 

The system and method of the present invention extends 
World Wide Web (referred to herein as "www" or the "web") 
and Internet technology to provide universally accessible 

25 caller-specific profiles that are accessed by one or more IVR 
systems. The invention features a set of web pages contain- 
ing information (components) formatted using MIME and 
hypertext markup language (HTML) standards with exten- 
sions for voice information access and navigation. These 

30 web pages are finked using HTML hyper-links that are 
accessible to users via voice commands and touch-tone 
inputs. These web pages and components in them are 
addressable using HTML anchors and links embedding 
HTML universal (uniform) resource locators (URLS) ren- 

35 dering them universally accessible over the Internet. This 
collection of connected web pages are referred to herein as 
the "voice web" and the individual pages are referred to 
herein as "voice web pages". Each web page in the voice 
web contains a specially tagged set of key words and touch 

40 tone sequences that are associated with embedded anchors 
and links used for navigation within the web. 

In addition, the invention features a set of linked HTML 
pages representing the user's "personal profile". The per- 
sonal profile contains user's attributes and preferences. 

45 Attributes include user's name, address, phone number, 
personal identification code, voice imprints for 
authentication, speech training profile and other informa- 
tion. Preferences include, configuration preferences such as 
personal greetings and gender and language selection, selec- 

50 tion preferences such as bookmarks and favorite places and 
presentation preferences such as priority ordering, default 
overrides and preferred vocabulary. 

The personal profile is designed for component access 
within web pages allowing easy extraction of context sen- 

55 sitive profile information. In particular, speech training 
profiles (included as a user attribute and which contain word 
patterns representing speaker dependent training 
information) partitioned into sets of related words likely to 
occur in combination within corresponding voice web 

60 pages. A set of command and control words such as "play, 
pause, continue, previous, next, home, reload, help, etc." are 
stored in a top level component set enabling user dependent 
but context independent navigation and control. Other com- 
ponent sets are designed to match the key word sets in 

65 corresponding voice web pages such as a calendar page or 
an address book page enabling user and context dependent 
navigation and control. 
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When a user calls into the distributed voice and speech FIG. 2B is a functional block diagram of an exemplary 

processing system associated with the voice web, the system calendar service . 

first identifies the user utilizing a unique account number F i G 2 C is a functional block diagram of an alternative 

(such as phone number or social security number). Next, it configuration of a voice web system in accordance with the 

accesses the user's personal profile using the corresponding 5 p reserit invention. 
URL and retrieves the user attributes and preferences related 

to authentication and security. Using this personal profile ? illustrates personal voice web used to provide 

information, the voice web system authenticates the identity personal services using the system shown m FIG. 2A. 

of the user using a combination of personal identification FIG. 4 illustrates a hierarchy of speech training pages that 

code based password checking and voice imprint matching. 1Q correspond to the service pages shown in FIG. 3. 

The voice imprint is any sufficiently long utterance or phrase FIG. 5 illustrates a hierarchy of attributes and preferences 

that the user has previously entered into his/her profile. Each pages that correspond t0 tne service pages shown in FIG. 3. 

user's voice imprint is analyzed and stored in the profile for ^ T ^, , . „ f , * u *• 

quick matching on demand with a real-time provided user J * 6 1S f flow L dl ^ ram of a f subscriber authentication 

sample. The combination of every individual's unique vocal method used in the dehver y of the P ersonal volce web 

characteristics stored in the voice imprint coupled with the 15 services shown in FIG. 3. 

random choice of the password phrase ensures a high degree FIG= 7 is a flow diagram of an enhanced speech recog- 

of security and authentication. nition processes used in personal voice web systems shown 

Once authenticated, the user is allowed to navigate and in FIG. 3. 

access more information from the voice web using voice pjQ g ^ a fl ow diagram of a query customization process 

commands. In order to effectively accomplish this task, the 20 m accorc jance with the present invention, 

voice web system retrieves the context independent com- n . „ - . - . , . 

mand and control key word set from the user's speech 9 ls a flow dla S ram of a vo,ce P^hing method in 



profile. 



accordance with the present invention. 



DESCRIPTION OF A PREFERRED 
EMBODIMENT 



The voice web system then presents a top level voice web FIG. 10 is a system diagram of a business-yellow-order 

personal home page for user's perusal. At the same time, it 25 P a S e system in accordance with the present invention, 
retrieves the set of word recognition patterns associated with 
the key words in the presented page from the user's speech 
profile. Thus, the system is able to match the active vocabu- 
lary and associated speaker dependent word patterns The figures depict a preferred embodiment of the present 

dynamically in a context sensitive manner. The process 30 invention for purposes of illustration only. One skilled in the 
continues as the user navigates from page to page. The voice art will rea dily recognize from the following discussion that 
web system dynamically retrieves the suitable subset of alternative embodiments of the structures and methods illus- 
training word patterns from the user's speech profile match- trated herein may be emp i oye( i without departing from the 
ing the voice navigation key words in the page being principles of the invention described herein, 

presented to the user. 35 

The process described above greatly reduces the size of System Description 

the training information that needs to be retrieved at any ^ T ^, . . p . , 1 , , 

L-i • -r *i u ■ c u FIG. 1 is a functional block diagram 01 a voice web 

time while significantly enhancing accuracy of speech rec- * nn . j - iL u • ^ 

1 1 , .? . . o, c . system 100 in accordance with the present invention. Voice 

o emtio n using speaker dependent training profiles. Since the 3 t „ _ A , , K . 

speech profile is constructed using HTML pages and 40 web system 100 extends the conventional mternet and world 
components, it is universally accessible using its URL. This wlde web < we , b or l f hn ° lo Sy to votce and speech 

enables the user to call into any compatible Internet con- P™^.ng apphcattons and also enables new uses for inter- 
acted voice web system in user's proximity from anywhere ac £ ve ™ ce res P onse ( IVR >. techn ° lo 6y- Vta« web system 

in the world, identify himseUTherself to the system and then 100 mcludes one ° r more ™™™ b «»f » 1° 2 «»PW <° °« 

t, ( , t i i • » t • i i • r A ~ or more voice web gatewavs 105 via the internet 101. Voice 

enable the system to dynamically retrieve suitable mlorma- 43 . & . * 

tion that enhances his/her navigation and access of the * eb Sltes , J 02 and web 105 transfcr filcs ovc ; 

information stored in the voice web using voice commands ^ ^ accordance with hypertext transport protocol 

and inout (HTTP). A subscriber 107 accesses the voice web system 

, , ^_ . . . . 100 by coupling to the gateway 105 using a telephone 111 

In addition to the user attribute information discussed , , , f. U1 - u \ .„i„_i, « i\, /dctka 

, , , • r , . ,. coupled to the public switched telephone network (PSTN) 

above, the personal profile contains user preferences relative 50 

to configuration, presentation and information selection. ' . . 

These preferences are components within the personal pro- Internet 101 » a s y stem of linked communications net- 
file pages and are easily available to the voice web system works lhal facilitale communication among computers 
for dynamic retrieval. For example, if the user requests which m mu P led to mtemet 101 ^neiMy, internets such 

his/her stock portfolio from the voice web, it first retrieves 55 as Internet 101 facilitate communication by providing file 
the user's preferred portfolio of companies from his/her transfer ' electronic mail and news group services Internet 
profile and applies this list to limit the search on stock quotes 101 » preferably the Internet which evolved from the 
from all companies. The user gets exactly the information ARPANET and which is publicly accessible world wide It 
relevant to his/her interest in exactly the order of priority should be understood however, that the principles of the 

he/she prefers 60 P resent invention apply to other internets and even closed 

(private) networks such as corporate intranets. 
BRIEF DESCRIPTION OF THE DRAWINGS „ shmi , d be noted ^ system m may indude numen)us 

FIG. 1 is a functional block diagram of a voice web voice web sites 102 and numerous voice web gateways 105. 
system in accordance with the present invention. A single voice web site 102 and a single voice web gateway 

FIG. 2 A is a functional block diagram of the voice web 65 105 are shown in FIG. 1, however, to keep the figure 
system shown in FIG. 1 configured to provide voice web uncluttered. Thus, voice web system 100 is a collection of 
services. voice web gateways 105 and voice web sites 102 connected 
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over internet 101 enabling subscribers 107 to access voice 
web pages 103 via their telephones as shown in FIG. 1. 

A voice web page 103 is web page specified using a 
navigable markup language that includes voice extensions. 
A navigable markup language is an enhanced type of 5 
markup language that facilitates publication navigation and 
access of information stored in documents specified in the 
navigable markup language. An exemplary markup lan- 
guage is the Hypertext Markup Language 2.0, RFC1866, 
HTML working group of Internet Engineering Task Force, io 
Sep. 22, 1995, edited by D. Connolly published on the www 
at the following uniform resource locator (URL) address: 
http://w3.org/pub/www/Markup/html-spec. 

A markup language is a language that includes a set of 
conventions for marking portions of a document so that, 15 
when accessed by a parsing program such as a web browser, 
each marked portion is presented to a user with a distinctive 
format. In contrast to formatting codes used by word pro- 
cessing programs, markup language codes, called tags, do 
not specify exactly how the tagged portion should be pre- 20 
sented. Instead the tags inform the web browser (parser) that 
the information is in a certain portion of a document such as 
title, heading, form or text and the like. The web browser 
(parser) determines how to present the tagged information. 

A navigable markup language is an enhanced markup 25 
language that uses tags that are anchors and that are links. 
When these link and anchor tags are invoked, a user is then 
presented another navigable markup language document in 
accordance with the link and anchor tags. This link is 
sometimes called a hyperlink. A hyperlink is a reference to 30 
another markup language document which when invoked 
facilitates access of the referenced markup language docu- 
ment. 

A navigable markup language thus uses attributes, tags 3J 
and values that enable (i) a publisher to specify the presen- 
tation of information to a user; (ii) a user to interactively 
access the stored information; and (iii) a user to access other 
navigable markup language documents using hyperlinks. 

The navigable markup language used to specify voice 40 
web pages 103 is Hyper Voice Markup Language (HVML). 
H VML is a version of HTML that includes voice extensions 
as described in Appendix A, incorporated herein by refer- 
ence. Voice web pages 103 include HVML tags and 
attributes that extend HTML to facilitate publication, navi- 45 
gation and access to voice information. For example, HVML 
specifies functions and protocols that facilitate voice and 
speech processing including voice authentication, speaker 
dependent speech recognition, voice information publishing 
(e.g. creating a voice form) and voice navigation. 50 

Just as conventional web documents are displayed for the 
user , voice web documents 103 are "played" to a subscriber 
over a telephone. A voice web page 103 is played (by voice 
web browser 106) by sequentially presenting the embedded 
voice components according to the HVML and MIME 55 
specifications. 

While a conventional web site enables on-demand access 
over an internet to conventional web pages, voice web site 

102 enables on demand access to voice web pages 103. 
Voice web site 102 is a computer that hosts voice web pages 60 

103 and serves them up to other computers (i.e. voice web 
gateway 105). More specifically, voice web server 102 is a 
computer configured with conventional web server software 
112 and which has access to stored voice web pages 103. A 
voice web site 104 additionally optionally includes a sub- 65 
scriber directory 104 that stores a list of registered system 
subscribers. Voice web site 102 stores, serves and manages 
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voice web pages 103 and can execute associated external 
scripts or programs in accordance with the present inven- 
tion. These external scripts and programs interface with 
databases and other information sources both internal and 
external to web site 102. 

Voice web gateway 105 is a computer connected to the 
internet 101. Voice web gateway 105 also includes a con- 
ventional voice telecommunications interface 114 for cou- 
pling to the public switched telephone network (PSTN) 109 
for telephonic communications with a subscriber 107. Tele- 
phone 111 is any voice enabling telecommunications device. 
Exemplary telephones include conventional desktop 
telephones, portable telephones, cellular telephones, analog 
telephones, digital telephones, smart phones and a computer 
configured to operate as a telephone and perform telephonic 
functions. Thus voice web pages 103 are universally acces- 
sible from any ordinary telephone 111. Alternatively, a 
subscriber 107 may access voice web pages 103 either by 
using a subscriber interface local to voice web gateway 105 
(i.e. a direct user interface with voice web gateway 105) or 
by dialing into voice web gateway 105 using another com- 
puter such as a personal digital assistant or a smart phone. 

Voice telecommunications interface 114 serves as an 
interface between a voice web browser 106 and telephone 
111 and preferably includes conventional telephony and 
voice processing hardware and software enabling voice web 
gateway 105 to receive and answer telephone calls, respond 
to touch tone and voice commands, route and conference 
calls, play voice prompts and record voice messages. 

Voice web gateway 105 additionally hosts a voice web 
browser 106. Voice web browser 106 is a computer program 
capable of accessing and processing voice web pages 103 in 
response to a request placed by subscriber 107. More 
specifically, voice web browser 106 (i) processes voice and 
touch tone activated subscriber commands, (ii) retrieves 
requested voice web pages 103 from the appropriate voice 
web site 102, (iii) interprets the embedded markup language 
(HVML) in the retrieved voice web page 103 and (iv) 
delivers the contents of a voice web page 103 to a subscriber 
107 over the telephone 111. In performing the above- 
mentioned processing, voice web browser 106 executes 
scripts, including "voice scripts" embedded in a voice web 
page 103. Voice web browser 106 provides a subscriber 107 
with fast, easy, convenient voice activated navigation and 
access to voice web pages 103. 

Voice web browser 106 is a conventional web browser 
modified with appropriate voice information playback and 
recording extensions and enhancements. Appendix A 
includes a specification of HVML and voice web browser 
commands and is incorporated herein by reference. 

Some voice web pages 103 contain references to scripts 
and programs that operate as service agents 110) to respond 
to subscriber requests as well as external events and carry 
out prescribed actions. These scripts and programs are 
externally stored on voice web sites 102 (for example as 
Common Gateway Interface (CGI) Scripts or Internet Ser- 
vices Application Programming Interface (ISAPI) 
programs). These external scripts and programs execute in 
the voice web server 102 environment as a service agent 
110. The external scripts and programs that comprise service 
agents 110 are referred to by URLs embedded in an asso- 
ciated voice web page 103. In the case of a voice web page 
103 that is a voice form, the script or program associated 
with the service agent executes in response to voice form 
submission by a subscriber 107. Service agents 110 follow 
standard Internet protocols such as HTTP, and conform to 
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conventional formats such as MIME and application pro- monitor, it can be navigated using the computer's mouse, 

gramming interfaces (APIs) such as CGI and ISAPI. keyword, and (with some additional plug-ins) microphone, 

and it can contain embedded anchors and hyper links to 

HVML Description other HTML pages, including other HVML pages. 

Conventional web pages are designed primarily for pre- 5 Voice web P a S es 103 are designed for three primary 

sentation on a computer color monitor and navigation by a purposes: (i) presenting structured voice information to a 

mouse and key board. As such, graphics, images and text are user i 00 enabling the user to navigate across and within 

the primary media types supported widely. Although, audio, voice P a S es ^ and 0") capturing user input for information 

video and 3-dimensional graphics extensions are becoming queries or submission, 

available, these extensions are directed primarily at com- 10 a - HVML Presentation 

puter users and not telephone users. Presentation of voice information is accomplished prima - 

Voice web pages 103 consist of HTML pages that have ri\y by the voice tag. The voice tag has a type attribute which 

been extended with Hyper Voice Markup Language specifies the type of voice information to be presented. If the 

(HVML) for easy and effective navigation and access of 15 tv P e attribute has the file value, the voice information is 

voice information via a voice activated device such as an obtained from a voice file specified by its URL. If the type 

ordinary telephone. Voice web pages 103 retain all the attribute has the text value, the voice inform aiiuu is synihe- 

properties and behavior of conventional HTML pages such sized from the specified text. If the type attribute has 

as HTML markup tags, universal identifiers (URLs), and number, ordinal, currency, date, or character value, then the 

hyper-links and can be accessed by a conventional web 2Q voice information is generated by concatenating voice frag- 

browser using HTTP protocols from a conventional web ments from a pre-recorded indexed system voice file. If the 

server. The additional markup tags are interpreted by an type attribute has the stream value, then the voice informa- 

HVML extended web browser to enable subscribers 107 to tiori is obtained from the voice stream specified by its URL. 

navigate and access voice web pages 103 over the phone or Composition of several voice elements into a seamless voice 

similar voice activated device. Appendix A includes a speci- 25 string is accomplished by the voice-string tag. 

fication of HVML and voice web browser commands and is Combining these tags, publishers can compose and 

incorporated herein by reference. present: (i) pre-recorded voice prompts and messages; (ii) 

HVML pages web pages voice web page 103 are specially voice prompts generated using text-to-speech technology; 

designed for presentation using an ordinary telephone 111 and ("0 Pre-formatted voice prompts with dynamic speech 

and navigation using touch tones and voice commands. This 30 synthesis elements, 

is in contrast to conventional multimedia web pages that b. HVML Navigation 

may embed audio data to be presented on a multimedia Navigation of voice web pages 103 is primarily accom- 

personal computer using its speakers and navigated using its plished by extending the HTML anchor tag with new 

mouse, key board and microphone. Although, HVML voice attributes — tone and label. These attributes are used in 

web pages 103 can be embedded in generic multimedia web 35 conjunction with the existing href attribute in an anchor 

pages, thus sharing some of the information, they are element that makes the anchor into a hyper link. When the 

designed to be presented using an ordinary phone and user selects the touch tone signals specified by the value of 

navigated using commands generated by touch tone signals the tone attribute or utters the word specified by the label 

and speech recognition. attribute, the browser invokes the corresponding hyper link. 

An HVML web page (voice web page 103) is first and 40 The tone and label attribute values must be unique within a 

foremost an HIML page. Each web page 103 has a unique P a S e - Navigation is also accomplished by system commands 

universal resource locator (URL) (also called uniform such as next, previous, reload, home, bookmarks, help, fax, 

resource locator). A URL is a string of characters that arid history which are invoked by specific touch tone 

uniquely identifies an internet resource including an identi- sequences or utterance of the words. Users can control the 

fication of (i) the access protocol to be used; (ii) an indica- 45 voice browser operations by issuing system commands such 

tion of resource type; and an identification of its location in as st0 P> start > P lav > pause, exit, backup, and forward. Using 

the computer network. For example, the following fictitious these attributes, publishers can enable (i) touch tone com- 

URL identifies a www document: http://www.voiscorp.com/ marid and control and link navigation; (ii) pre-defined, 

banner.gif uniquely identifies the location of a resource on system and user specific, spoken command and control key 

the world wide web computer network, "http://" indicates 50 word recognition; and (iii) page and user specific spoken 

the access protocol, "www.voiscorp.com" is the domain command and control key word recognition, 

name of the computer on which the resource is located. c, HVML Forms 

"banner" is the name of the resource located on the computer HVML uses the form tag to enable user input similar to 

specified by the domain name, "gif ' indicates that the banner HTML including the method attribute which specifies the 

resource is a gif (graphical interchange file) type resource. 55 way parameters are passed to the server and the action 

Similarly, the following fictitious URL uniquely identifies attribute which specifies the procedure to be invoked by the 

the location of a voice web page 103: http:// server to process the form. HVML extends the input tag 

www.voiscorp.com/voicememo.hvml. In this example, within forms by introducing voice-input tag. Voice-input 

"voice memo" is the name of the resource located on the takes a type attribute similar to the input tag with three new 

computer specified by the domain name, "hvml" indicates 60 values "voice", "tone" and "review" in addition to the 

that the voicememo resource is an hvml type resource. Thus, existing "reset" and "submit" values. The HVML browser 

web pages 103 are each uniquely identified by their corre- pauses at each voice-input statement in a HVML form until 

sponding URL. Once located, a web page 103 can be the specified input is supplied or input is terminated, before 

created, edited and played using existing web publication processing the remaining form. Using these tags and 

tools, it can be stored on any conventional web server 65 attributes, publishers can enable: (i) touch tone command 

anywhere on the Internet, it can be accessed by any con- and control and parameter input; (ii) pre-defined, user 

ventional web browser and presented on a computer specific, spoken alphabet and digit input; (iii) page and user 
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specific, spoken key word and proper names input; and (iv) FIG. 2 A is a functional block diagram of a voice web 

free form voice information input. system 200 configured to provide voice web services to a 

~ , tx . f + u x/- u r> subscriber 107. Voice web system 200 includes one or more 

Operational Description of the Voice Web Browser , # iA . 1Jt . 

r voice web gateways 105 coupled to one or more service sites 

Syntactic and structural intelligence, such as in-line pre- 5 202 via internet 101. Service site 200 is a voice web site 102 
recorded voice prompts, pre-formatted voice prompts with configured to provide voice web services. Each voice web 
dynamically generated voice elements, key word accessible service is implemented using a collection of service agents 
anchor elements, voice responsive hyper links etc. are 201 and service pages 203 centered around a service data- 
embedded in voice web pages 103 through voice access base 202. Additionally, service site 200 optionally includes 
extensions to HTML, Behavioral intelligence including 30 a personal profile 204 to be used to the extent that the service 
command interpretation, page access, file caching, HVML being provided requires pre-stored subscriber-specific infor- 
interpretation and user interaction is embedded voice web mation (i.e. pre-stored information personal to the particular 
browser 106 (the HVML browser). Voice web browser 106 subscriber). 

has the following states: (i) waiting for user commands; (ii) Voice web service 2M ^ a of X[vi(x 
active accessmg and playing HVML pages; and (m) paused 15 u0 ( showQ ^ FIG t) that execute on service site m tQ 
or user input. provide voice web services to a subscriber 107. Voice web 
Initially, voice web browser 106 is launched upon the service agents 201 are therefore scripts and programs rep- 
system's receipt of a subscriber's telephone call. Once resented by a web page 103 (show in FIG. 1). 
launched, voice web browser 106 goes through an initial- Seryice ^ ^ a Qf information 
.zation sequence that includes subscriber authentication and 20 Tne of the mformation varies with , he 

normally becomes active accessmg and playing the sub- f * u * - , , ^ it- i_ . 

m , . ^ , L & * i , ■ of service being provided. For example, if voice web system 

senber s home page. Once the home page is played, voice 1Aft . c j » j i- u • • 

u u i n/ u • c u i_ j a 1^0 is configured to deliver a business white page service, 

web browser 106 waits for subscriber commands. As part ^ 202 is a database of address and phone 

or playing tne page, tne browser may "pause tor subscriber number ^ for businesses _ If 

voice web system 100 is 

input and continue once the input is provided. 25 jj *,* u l4 . • i c j * j v 

v additionally or alternatively configured to deliver news 

Independent of any specific voice web page 103 that a headlines, then voice web system 100 includes a service 

subscriber may be accessing, voice web browser 106 pro- database 202 that includes current news headlines, 

vides a set of navigational and operational commands. o r j i M - . + M iL . 

w/-*u- *u * i u i j , u ^ ii Service torms and pages 203 are voice web pages 103 that 

Within the telephone key pad, "* and "W are special keys ttitwt . i . / • r j \ l uc» j 

♦ ■ * \r • u u i_ ™ are HVML templates (voice forms and pages) that are filled 

that generate unique tones. Voice web browser 106 has 30 . v -r * A ^ 

- , 7 *l i t i iL i in in response to a specific subsenber request. Service 

special meaning for these keys. In general, the * key ~ M r , , . 

f ,i , . c * u . i i* *u pages and forms 203 are used to gather subscriber input, to 

followed by a sequence of touch tones, excluding the "#" , . . r , , v & , , t . , . . c K 7 

i^. *. i' i M a i- j.u retrieve information and to deliver (publish) information to 

key, signals a browser command, an escape or a skip and the , 0 . ' _ 

i i-i *• *• * ■ *■ ec • * a subsenber. Some service pages 203 are database entry and 

tt key signals a link activation, termination of form input, i ■ <* , 

* „• *■ r i i « administration torms, some are database query forms and 

termination of a key sequence or a selection. 35 ' * 

' others are database response pages. Entry forms are used to 

Voice Web Services add information to the database. Query forms are used to 

Voice web system 100 can be used to provide voice web extract information from the database. Response pages are 

services to a subscriber 107. A voice web service is a service used t0 present retrieved information to the user. In the 

that provides on-line telephone based access to information. 40 P refered embodiment, service agents dynamically generate 

The information is presented to the user through the publi- service and pages forms 203 by retrieving requested data 

cation of voice web pages 103. The information presented to from service database 202 and using the retrieved data in 

(published for) the subscriber may be information retrieved P lace of corresponding variables stored in an HVML tem- 

from a single information source or a combination of P late - llie HVML templates link to each other specifying 

information sources including publicly accessible on-line 45 request-response dependencies. Thus, subscribers 107 arc 

databases, information proprietary to voice web system 100, able to enter and retrieve information in personal and 

information previously stored by subscriber 107 or another external databases over internet 101 using web protocols 

informaton source. Exemplary services provided by voice without having to create a voice web page for each entry in 

web system 100 include (i) personal information services service database 202, 

such as calendar, address book, electronic mail, voice mail, 50 Service agent 201 typically uses a service database 202 

(ii) information services such as headline news, weather and a set of service pages and forms 203 to provide the 

reports, sports score, stock portfolio quotes, business white corresponding voice web service. The service database 202 

pages, yellow pages, classified information and (iii) trans- hosts the information that subscribers 107 wish to access, 

action services (commerce services) such as banking, bill The service forms allow subscribers 107 to input and query 

payments, stock trading, airline hotel and restaurant reser- 55 information in service database 202. Service pages allow 

vations and catalog store orders. service agents 201 to present the requested information to 

Users gain access to voice web services by becoming tne subscriber 107 using voice web browser 106. 

voice web subscribers 107. Subscribers 107 preferably sign FIG. 2B is a functional block diagram of an exemplary 

up (e.g. register) for services through a service provider. In calendar service. The calendar service agent 210 uses the 

one embodiment, each subscriber 107 is assigned a unique 60 calendar database 211 together with the calendar and 

account number on a calling card and subscribers 107 access appointment details input and query voice web forms 212 

the voice web system 100 by dialing a single "800" (e,g. toll and appointment list and details voice web pages 213. 

free) service phone number and by then supplying their Subscribers fill in the calendar and appointment details input 

account number via the telephone 111. In an alternative voice web forms 212 to set their calendar appointments and 

embodiment, the services are publicly available and any user 65 their details. The calendar service agent 210 processes the 

placing a call into the system is processed as a subscriber submitted form and updates the calendar service database 

107 without requiring any registration. 211. Later, subscribers can retrieve their appointments for 
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any day by supplying 214 the month, date and year for that Business white pages home page 314 is used to provide a 
day in the calendar query voice web form 212. The calendar white page service. The white page service enables a sub- 
service agent 210 processes the submitted form, retrieves the scriber to enter partial company name, and optionally city 
matching appointments from the calendar database, and name and state code to retrieve the company's full name, 
dynamically composes and returns the appointment list 5 address and phone number. 

voice web page 213. If the subscriber requests for the details Each service page 309-314 is part of a collection of voice 

of any appointment, the calendar service agent 210 dynami- forms and pages that are used by the corresponding service 

cally generates and supplies the corresponding appointment a S ent t0 retrieve a request from the subscriber, generate an 

details page 213, appropriate database query responsive to the subscriber- 

10 request, retrieve subscriber-requested information, and gen- 

The Personal Voice Web erate a voice web page that incorporates the retrieved 

~ , . . . information and that is adapted for presentation 

HG 3 shows a personal voice web 300 in accordance (^5^^) t0 the subscri ber using a voice web browser, 

with the present invention. Personal voice web 300 is Thus, for example the service agent associated with calendar 

standardized collection of linked voice web pages and voice and appointments page 309 generates a voice form for 

web forms (a special type of voice web page) that form a 15 prom pting a subscriber for month, day and year information, 

personal service space for the subscriber. Preferably, all MiCT receiving the prompted information, calendar and 

subscribers share a common structure of linked voice web appointments service agent generates the appropriate query 

pages although the contents of personal voice web pages t0 extract the reqU ested calendar information from a calen- 

vary from subscriber to subscriber. Because each subscriber dar ^ice database. Once the calendar information is 

of the personal voice web system 300 has the linked page 20 retrieved from me database, the calendar and appointments 

structure shown in FIG. 3, subscribers navigate about and agent generates a voice web page that includes the 

access information from their personal voice web 300 in a retrieved information. The new page is then presented 

standardized way. Each page in personal voice web 300 (published) to the subscriber over the telephone by the voic« 

includes an agent that performs various processing tasks werj browser 

required for each respective page. At the root of personal 25 £ach of ^ othef ^ associated with 

voice web 300 is the personal home page 301. Personal nal 30g _ 327 e in a similar tQ 

home page 301 links to a personal profile page 302 a ide a with irjforma tion retrieved from asso- 

personal administrative assistant page 303, a personal help- ciated databases . 

desk page 304, and a personal commerce page 305. n 1 L , j i • i- 1 _■ 

__ 1 • ■ ™ Personal helpdesk page 304 is linked to personal voice 

The personal administrative assistant page 303 is linked to web helpdesk pages 331 including, by way of 

a number of personalized voice web services (service pages) example, a hotels page 315, an airlines page 316, a rental 

330 including, by way of an example, a calendar and cars page 317 a tfavel agents page 318 a restaurants 

appointments page 309, an address book page 310, a stock 319j a finan cial services page 320, and a banks page 321. 

portfolio page 311, a news headlines page 312, a mail box ^ personal helpdesk page has aQ associated personal 

page 313, and a business white pages home page 314. helpdesk agent that is used to provide a set of helpdesk 

Calendar and appointments page 309 is used to provide an services. Helpdesk services enable a subscriber to access 

appointments service. The appointments service enables a product, pricing, availability and other information of the 

subscriber to track personal and business appointments in a corresponding services. 

voice-based calendar. The subscriber thus adds and retrieves Hotels page 315 k ^ to provide a notel reserV ation 

appointments over the phone using personal voice web 300. service . Amines page 316 is used to provide an airline 

In addition to providing day and time information related to booking service. Rental cars page 317 is used to provide a 

stored appointments, a subscriber may also store voice note rental car reservation service. Travel agents page 318 is used 

annotations that is associated with a particular appointment. t0 provide a trave i service. Restaurants page 319 is used to 

Address book page 310 is used to provide an address 4$ pr ovide a menu and reservations service. Financial services 

service. The address service enables a subscriber to add and pa ge 320 is used to provide a financial service. Bank page 

retrieve address, phone number, and other information 321 is used to provide a bank service, 

related to individual names or company names. The infer- Personal commerce page 305 is linked to personal voice 

mation added and retrieved is stored in a address book we b commerce service pages 332 including, by way of 

service database private to the subscriber. 5Q example, an apparel shops page 322, a luggage stores page 

Stock portfolio page 311 is used to provide a stock quote 323, a gift shops page 324, a flower shops page 325, an office 

service. The stock service enables a subscriber to retrieve supplies stores page 326, and a book stores page 327. The 

current stock pricing and portfolio valuation information as personal commerce page provides commerce services that 

well as statistical information related to changes in portfolio enables a subscriber to access catalogs associated with 

or stock positions. The stock service uses information 5S various retail establishments. As part of the commerce 

retrieved from a stock portfolio service database private to service, the personal voice web allows a subscriber to shop 

the subscriber and additionally retrieves current stock pric- in various catalogs and then submit orders for selected items 

ing information from an on-line data-base or information directly to the sponsor of the associated catalog. Orders are 

source. submitted to the catalog sponsor either as a voice web form 

News headlines page 312 is usedenables ide a news 60 or conventional web form sent to the sponsor, as an elec- 

service. The news service enables a subscriber to retrieve tronic message or using another means, 

news headlines related to subscriber customized topics. Personal profile page 302 links to a set of personalized 

Mail box page 313 is used to provide a mailbox service. voice web profile pages including an authentication page 

The mailbox service enables a subscriber to access elec- 306, a speech profile page 307, and an attributes and 

tronic mail (e-mail) messages. The e-mail messages are 65 preferences page 308. 

played for the subscriber using text to speech conversion and User authentication page 306 contains authenticating 

a speech synthesizer. information including a subscriber account number, an 
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encrypted password or personal identification number and 
links to a voice authentication signature MIME resource. 

Speech profile page 307 is linked to a hierarchy of speech 
training pages that correspond to the hierarchy of personal 
voice web 300. FIG. 4 shows the hierarchy 400 of speech 
training pages 401-427. Speech training pages 401-427 are 
sets of pre-captured training files to be used in performing 
speaker dependent speech recognition in providing the cor- 
responding service to a subscriber. Each speech training 
page is thus accessed by the corresponding agent in per- 
forming the corresponding service. For example, the admin- 
istrative assistant service accesses administrative speech 
training set 431 (including speech training pages 409-414). 
The helpdesk service accesses the helpdesk training page set 
432 (including speech training pages 415-421). The com- 
merce service accesses the commerce training page set 433 
(including speech training pages 422-427). 

Each speech training page 401-427 includes training data 
specifically tailored to the words more commonly associated 
with the corresponding service. For example, the calendar 
speech training page 409 includes training vocabulary to aid 
in the recognition of voice commands such as "Tenth", 
"November", "Tuesday" and so forth. 

Referring now again to FIG. 3, personal attributes and 
preferences page 308 includes subscriber attribute informa- 
tion including name, account number, address, voice tele- 
phone number, fax telephone number, paging telephone 
number, encrypted credit card numbers and the like as well 
as personal preference information such as configuration, 
selection and presentation preferences. Personal attributes 
and preferences page 308 is also linked to hierarchy of 
attribute and preferences pages (shown in FIG. 5) that 
correspond to the hierarchy of personal voice web 300. 

FIG. 5 shows the hierarchy of attributes and preferences 
pages 501-527 associated with personal attributes and pref- 
erences page 308. Attributes and preferences pages 501-527 
are pages that store subscriber-specific preference informa- 
tion to be used in providing the corresponding service to a 
subscriber. Each attributes and preferences pages 501-527 is 
thus accessed by the corresponding agent in performing the 
corresponding service. For example, the administrative 
assistant service accesses attributes and preferences set 531 
(including attributes and preferences pages 509-514). The 
helpdesk service accesses the helpdesk attributes and pref- 
erences set 532 (including attributes and preferences pages 
514-521). The commerce service accesses the commerce 
training page set 543 (including attributes and preferences 
pages 522-527). 

It should be noted that the user profile information for 
multiple subscribers is stored in user profile databases. The 
user profile databases are accessed by service dependent 
profile agents. For example, personal identification and 
verification information of multiple subscribers is stored in 
a user profile home page database (a service database) and 
accessed by the subscriber's profile home page agent. Cal- 
endar attributes and preferences information for multiple 
subscribers is stored in the subscriber calendar attributes and 
preferences profile database (a service database). Calendar 
service specific speech training information for multiple 
subscribers is stored in the subscriber calendar speech 
training profile database (a service database). Calendar ser- 
vice profile agent responds to HTTP form requests for 
calendar attributes and preferences or calendar speech train- 
ing profile page information for any particular subscriber 
and supplies the appropriate subscriber profile page infor- 
mation as HVML voice web pages. 
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The collection of profile pages for a single user constitute 
that user's personal voice web profile 300. Personal Voice 
web profile 300 need not be a collection of static HVML 
pages (voice web pages), but instead be generated dynami- 

5 cally using user profile page databases. However, once 
generated, these profile pages can be reused from various 
cache systems within the voice web system without having 
to retrieve them from their original databases thus saving 
significant time and resources. 

10 In operation, a personal voice web service agent uses a 
corresponding service profile agent to retrieve subscriber 
and service specific attributes and preferences, speech train- 
ing profiles and other information from the corresponding 
service profile database. The personal voice web service 

15 agent uses the retrieved subscriber and service specific 
information in personalizing the voice web service forms 
and pages as well as in enhancing and improving speech 
recognition by embedding the speech training profiles in the 
corresponding voice web forms and pages. 

20 Referring back to FIG. 2B, for example, the calendar 
service agent 210 uses a corresponding calendar service 
profile agent 215 to retrieve subscriber specific calendar 
attributes and preferences included in profile database 216 
by specifying the subscriber's calendar attributes and pref- 

25 erences profile URL as part of a profile request web form. 
Calendar service profile agent 215 responds to the submitted 
web form, retrieves the requested subscriber information 
from the calendar service profile database 216 and delivers 
it to calendar service agent 210 as a table formatted web 

30 P a ge* Calendar service agent 210 retrieves the requested 
information from the table format in the web page and uses 
the subscriber's attributes and preferences to customize the 
voice web service form and page templates 213 before 
presenting them to the subscriber. In this way, the subscriber 

35 can have a personalized form or page presented to him/her 
without having to supply information about himself/herself 
repeatedly in each call. 

Similarly, calendar service agent 210 uses a correspond- 
ing calendar service profile agent 215 to retrieve subscriber 

40 specific calendar speech training profiles from profile data- 
base 216 by specifying the subscriber's calendar speech 
training profile URL as part of a profile request web form. 
Calendar service profile agent 215 responds to the submitted 
web form retrieves the requested subscriber information 

45 from the calendar service profile database 216 and delivers 
it to the calendar service agent 210 as a table formatted web 
page. The calendar service agent 210 retrieves the requested 
information from the table format in the web page and 
embeds the subscriber's speech training profiles in the voice 

so web form and page templates (pages 212,213) before deliv- 
ering them to the voice web browser. The voice web browser 
uses these speech training profiles to dynamically change the 
active vocabulary in the voice processing software and 
hardware thereby customizing it to the subscriber. 

55 FIG. 2C is a functional block diagram of an alternative 
configuration of a voice web system in accordance with the 
present invention. The system includes a computer config- 
ures as a combined voice gateway and voice web site 
(combined site) 220. Combined site 220 includes gateway 

60 components such as a voice and telephony interface 114, a 
voice web browser 106 and server software 112. Combined 
site 220 additionally includes voice web site components 
such as service agents 201, service database 202 and service 
forms and pages 203. Combined web site 220 provides voice 

65 web access to a subscriber 107 coupling the combined site 
220 via the PSTN 109. Because the voice gateway and voice 
web site functions are combined within a single computer 
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environment, the server software 112 (located in combined directory. The login agent additionally verifies the PIN 

site 220) and the voice web browser 106 exchange files which was submitted. Upon verification of the PIN, the login 

without suffering the delays imposed by routing across the agent presents 603 the subscriber's voice authentication 

Internet 101. In certain applications, for example when a form to the subscriber over the telephone. As part of the 

subscriber is accessing personal databases this configuration 5 presentation, the login agent requests the subscriber to 

is advantageous to improve system performance. It should supply a personalized voice authentication sample. The 

be noted, however, that even though server software 112 login agent then waits 604 for the subscriber to supply the 

(located on combined site 220) and voice web browser 106 sample and submit 605 the form. After the subscriber 

exchange files using a local interface as opposed to Internet submits 604 the form, the login agent processes 606 the 

101, they nonetheless exchange files in accordance with 10 submitted form. During processing 606 of the submitted 

HTTP. form, the login agent accesses the subscriber's personal 

Voice web browser 106 communicates with other web authentication page from the subscriber's personal voice 

sites (such as web sites 224 and 225) using Internet 101. web profile (linked to the subscriber's home page) and 

Web site 224 is a computer coupled to Internet 101 config- attempts to retrieve the voice authentication signature. If this 

ured with server software 112, service agents 201, service 15 is the first time the subscriber is accessing the service, the 

database 202 and service forms and pages 203, Web site 224 signature will be missing from the subscriber's authentica- 

is configured to deliver voice web services as described in tion pag e * in inis case, the login agent presents 607 the 

reference to FIGS. 2 A and 2B. authentication signature creation form to the subscriber. 

Web site 225 is a computer configured with server soft- Usin § lhe °P tions P resented in the signature creation form, 

ware 112, a profile service agent 223, service forms and 20 the subscriber selects the option to create or modify the 

pages 222 and profile database 221. Web site 225 is a Personal voice authentication signature. Following the 

universally accessible profile web site that is accessed by instructions provided by the login agent, the subscriber fills 

any other web site or web gateway in the voice web system m 608 the V0ice authentication signature creation form and 

as long as the accessing web site or web gateway has the records a Personalized voice phrase as an authentication 

appropriate URL information. Web site 225 provides user 25 signature. After filling in 608 the signature creation form, the 

profile information to web site agents (such as service agents subscriber submits the form to the login agent. The login 

201) located on other web sites (such as web site 224 and a S ent waits UDtl1 the signature creation form is submitted 

combined site 220). Advantageously, any web site and/or 609 ' ^ lo ® a a S ent then processes 610 the recorded phrase 

web gateway can thus access information stored in the converting it mto a signature pattern and linking it to the user 

profiles database 216 by hyperlinking to the web page 30 authentication page as a MIME resource for future verifi- 

associated with profile service agent 215. cation. 

If however, after processing 606, the login agent deter- 

User Authentication and verification mines that there is an authentication signature stored in the 

Personal voice web system 300 uses a login agent as a subscriber's personal profile then the login agent perform a 
gatekeeper to the access of each subscriber's personal voice 35 test 611 10 dete nnine whether there is a match between the 
web. The login agent is a distributed software program that stored authentication signature and the voice sample sub- 
can receive subscriber information over a telephone, access miUed b ? the subscriber. If test 611 determines that there is 
the subscriber's personal profile pages from the subscriber's a match between the sample and the signature, then the 
personal voice web and verify the subscriber's credentials subscriber is given access to the personal voice web and the 
over the telephone 40 v °i ce we ^- Test 611 uses conventional voice authentication 

Each system subscriber is given (i) an account number (ii) method , s - ^ " matcn " « determined by -ted .611 when the 

a personal identification number PIN) and (iii) a service conventional voice authentication method determines that 

i K , , r _ , . v y . v / ... the speaker s voice print or voice signature matches a master 

calling number. In order to access a personal voice web, the K 6 . , . . _ , 

<u t u * 11 • u j . stored voice print or voice signature within a specified 

fcdbseriber calls the service calling number and uses account , ; t . Tr , r . 4 4 . . , *. 

information and the PIN to initiate a subscriber authentica- 45 t0l 'TJ?\ ' fu^ de J e ™ 1Qes that there * QOt a 

tion process. FIG. 6 is a flow diagram of a subscriber between the sample and the signature, then the 

authentication method 600 in accordance with the present subscriber * denied access 613. 

invention. The subscriber authentication method 600 r? u o u e> 

. , , /. . Enhanced Speech Recognition 

includes authentication signature creation form processing 50 

and subscriber authentication processing. Automatic speech recognition falls into three categories: 
A subscriber initiates access 601 of his or her personal speaker dependent, speaker adaptive, and speaker indepen- 
voice web 300 by calling the service calling number using den t. A speaker dependent system is developed to work for 
a conventional telephone or a similar voice activated device a single speaker and are usually easier to develop, cheaper 
computer configured to access the public telephone network. 55 to buy and more accurate but requires the use of user- 
After the subscriber initiates access 601, a login agent starts specific speech training files. 

login processing 602. The size of the vocabulary of a speech recognition system 

During login processing 602, the login agent answers the affects the complexity, processing requirements and the 

call and presents a standard login form to the subscriber. A accuracy of the system. Referring now again to FIG. 3, 

login form is a voice form for collecting and submitting 60 personal voice web 300 uses small to medium sized vocabu- 

login information including subscriber account number and lanes (ten to hundred of words). 

the subscriber PIN. After a subscriber enters the login An isolated-word or discrete speech system operates on 

information (into the login form) and submits the login form, single words at a time requiring a pause between each word 

the login agent uses the login information to retrieve the utterance. This conventional type of speech recognition is a 

URL of the subscriber's personal voice web home page 301. 65 simple form of recognition to perform because the end 

The login agent retrieves the URL by looking up the points are easier to find and the pronunciation of a word 

subscriber's account number in the voice web subscriber tends not to affect others. As the occurrences of the words 
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are more consistent and sharply delimited they are easier to loads the personal voice web profile page 302 and the speech 

recognize. Personal voice web 300 focuses on discrete profile page 501 containing the command and control 

speech and in particular on speech used for command and vocabulary for the home page. Th is vocabulary includes the 

control. basic voice web browser command and control as well as 

Personal voice web 300 typically uses speech coded at 8 5 home page specific command and control. From the home 

kHz using 8 bit samples resulting in 64 kbps bandwidth and P a g e > lhe subscriber requests a particular service (i.e. per- 

storage. Conventional adaptive pulse code modulation sonal administrative assistant, the personal helpdesk or the 

(ADPCM) techniques can reduce the bandwidth to 16 kbps personal catalog store). The home page agent determines 

without loss of information. ?03 what service the subscriber has selected and in response, 

Personal voice web 300 uses conventional speaker depen- 10 704 ! he s * tciGd service and ^ D ^ e6s t0 de J* ve L r 

dent recognition of discrete speech. This conventional 705 the service ' Duni« invocation 704 of the service, both 

speaker dependent recognition relies on digital sampling of ^ «™» P a S e and lhe t s Pf f ***** P a S e associated 

the word utterances. After sampling, the next stage is w ' lh ^ «mce page are loaded on the voice web gateway 

acoustic signal processing. Most techniques include spectral where the ™ ice web hro ? SCT ™ GS them t0 dellver the 

analysis. This is followed by recognition of phonemes, 15 semce and im P rove s P eech recognition, 

groups of ohonemes and words. This stage uses many Durm S delivery 705 of the selected service, the sen-Ice 

conventional processes such as Dynamic Time Warping, a S eut uses ihe s P eech lraiDm g P a S e associated with the 

Hidden Markov Modeling, Neural Networks, expert systems selected service to recognize voice commands submitted 

and combination of techniques. Hidden Markov Modeling 720 b ? the subscriber. Specifically, the service agent obtains 

based techniques are commonly used and generally the most 20 the s P eech lramin g profile, embeds it m the service page as 

successful approach. Additionally, personal voice web 300 a MIME resource and forwards it to the voice web browser 

uses some knowledge of the language to aid the recognition which ^ the trainin g profiles to improve recognition, 

process. Thus, responding to the subscriber's voice commands per- 

r, * „ , -, AA . i j j * tinent to the accessed voice web service page, the voice web 

Personal voice web 300 improves speaker dependent , . , , , , 

c j- * • • i , , , 25 browser recognizes the command and control word utter- 

recogmtion of discrete speech in a command and control , . f T , , , , , \ 

context using universally accessible personal speech train- ^ ^ * ^ C °P imands lh , at Sub ! m | tted 

ing profiles 401-J27. As described above, the personal 720 > and matehes ] thera a g ainst Penalized vocabulary 

speech training pages 401-427 are organized as a linked in the c ° rre sP° nd J°S ™* w «b speech training page for 

collection of voice web profile pages each linked to the 4 accurate speaker de P endent recognition of discrete speech, 

corresponding personal voice web service page. Thus, the lf the subscnber requests access to a new service page 

personal speech training profile pages parallel the personal hnked lo a ^"^V accessible service page, the currently 

voice web service pages in structure as shown in FIGS. 3 and actlve servlce a 8 enl e5Qls 706 lhe currenl md then 

5. Each speech training page 401-427 contains the training mvokes 704 ret l uested During the invocation of 

vocabulary for browser command and control that is context „ the the requested voice web service page 

dependent corresponding to the requested service is loaded as well as 

Each service page 301-327 linked to the personal voice the , ™"*P«»M'» s P 6 H 6ch P*8P containing the 

. u. • • matching command and control vocabulary. In this process 

web home page 401 has a corresponding speech training -nn 4U 4 . • . , t . * 

am aZi t -tu i - u ■ . j . 700, the active service agent always uses the most appro- 
page 402-427. The personal voice web 300 is constructed in . ' , , r .u • * U .1 
f.-It, „ „,„ tU * u x - -xf\"> ^-vri- i pnate vocabulary for the existing context thereby greatly 
such a way that each voice web service page 302-327 links ACi j ■ iU • ft , t . & . t . ■, . 
f J j. , , • . * f M . . 40 reducing the size of the active vocabulary tha needs be 
to its corresponding speech training page 401— 427 using its j . . c , . , 
t m t a *i\ u -i_ * * r . accessed while significantly improving the speaker depen- 
URL. As the subscriber navigates from service page to , y 

service page in the personal voice web 300, the system is en reco S ni 10n - 

able to access the corresponding speech training page using Query localization and customization 

its embedded URL. 45 Query customization uses stored subscriber attributes and 

Each speech training page 401-427 contains a set of preferences to customize queries of service databases. Query 

command and control key words and their personalized customization is accomplished by maintaining user 

speech recognition patterns representing the context sensi- attributes and preferences in a collection of voice web pages 

tive vocabulary for the corresponding service page. For 501-527 (described above in reference to FIG. 5) that 

example, the calendar and appointments service page 309 is 50 parallel the corresponding voice web service pages 301-327 

linked to a corresponding speech training page 409 contain- (described above in reference to FIG. 6) and using the 

ing key words and recognition patterns for "year", "month", attribute and preferences information corresponding to the 

"day", the names of the months and days, digits representing service requested to customize the query parameters within 

dates and limes etc. Similarly, stock portfolio page 311 is forms. 

linked to a corresponding speech training page 411 contain- 55 Referring now again to FIG. 5, the attributes and prefer- 

ing key words and recognition patterns for "stock", "quote", ences pages 501-527 parallel the personal voice web service 

"volume", "option", "symbol", names of companies in the pa ges 301-327 in structure as shown in FIG. 3. Each service 

portfolio etc. p a g e un ked to the personal voice web home page 301 has a 

FIG. 7 is a flow diagram of a speech recognition process corresponding voice web attributes and preferences page 

700 in accordance with the present invention. The process is 60 linked to it. The personal voice web 300 is constructed in 

initiated after a subscriber has gained access 701 to the such a way that each voice web service page 301-327 links 

personal voice web in accordance with the process described to its corresponding voice web attributes and preferences 

in reference to FIG. 6. Once the subscriber gains access to page 501-527 using its URL. As the subscriber navigates 

the personal voice web 701, the login agent accesses the from service page to service page in the personal voice web 

subscriber's personal voice web homepage and presents 702 65 300, the system is able to access the corresponding voice 

the home page to the subscriber over the phone. During the web attributes and preferences page using its embedded 

process of presenting 702 the home page, the login agent URL. 
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A subscriber of voice web services requests information editing tools available on personal computers and worksta- 
by accessing a voice web service page and having it played tions. Alternatively, voice web agents can dynamically com- 
by the corresponding agent (i.e. administrative assistant, pose voice web pages and forms based on user requests and 
helpdesk or commerce agent). The subscriber requests ser- optionally profiles as well as accessed databases and ser- 
vice through submitting a query form presented by the 5 vices. Advantageously, dynamic form-based publication 
corresponding agent. The query form is an HVML form for enables information and service providers to publish voice 
touch tone and voice data input. When a service is requested web pages using the conventional telephone without the 
by the subscriber, the agent retrieves the corresponding need for any additional computer based voice web publish- 
voice web attributes and preferences page and automatically ing tools. Dynamic form-based publication is achieved by 
fills the query form with appropriate default parameters to combining voice web publishing forms, voice web publish - 
obtained from the subscriber's attributes and preferences. ing agents and voice web page publishing templates. 
For example if the subscriber is accessing the weather FIG. 9 is a flow diagram of a voice publishing method in 
service page, the agent fills in the subscriber's home town accordance with the present invention. The method presents 
and other chosen cities automatically from the subscriber's 901 a voice web form to a caller calling into a voice web 
attributes and preferences page. Similarly, if the subscriber 15 system using a conventional telephone. Voice web publish- 
is accessing the stock portfolio service page, the agent mg forms are specially designed voice web forms that when 
accesses the corresponding attributes and preferences page interpreted (i.e. when played back) using the voice browser 
and fills in the subscriber's chosen portfolio of stocks in the prompt the caller (the voice information publishers) to input 
query form. In addition, the agent also automatically fills in voice and touch lone based input using a telephone. The 
the appropriate subscriber attributes such as his/her access 20 forms guide the caller step by step to supply the needed 
account number, password etc., thereby easing the subscrib- information, edit and modify the information and finally 
er's access while exploiting the availability services through submit 903 the information for processing 902. 
web based queries. Voice web publishing agents process 902 the filled voice 

FIG. 8 is a flow diagram of a query customization process web publishing forms extracting and separating voice infor- 

800 in accordance with the present invention. The process is 25 mation and touch tone input. Based on the touch tone inputs, 

initiated after a subscriber has gained access 801 to the the agents may present additional publishing forms to the 

personal voice web in accordance with the process described caller (publisher). The voice information is stored 904 in 

in reference to FIG. 6. Once the subscriber gains access 801 voice files and linked to the corresponding voice web page 

to the personal voice web, the login agent accesses the publishing template by substituting variables within the 

subscriber's personal voice web home page and presents 802 30 page template with the generated files. The touch tone input 

the home page to the subscriber over the phone. is used whenever the caller (publisher) needs to input 

During the process of presenting 802 the home page, the alphanumeric information that can be processed by the 

login agent loads the attributes and preferences page 501 publishing agent. 

from the subscriber's voice web personal profile. Attributes V oice Web White, Yellow and Order Pages 

and preferences page 501 contains preferences for the home „ ri , .... . . ...... 

page 301. From the home page 301, the subscriber accesses Wlthou ' hmitm S f eneral W^bihty of form based 

the targeted voice web service page by navigating the V01Ce web P a § e Polishing a specific application of the 

appropriate hyper links from the voice web home page 301. pr0Ces * of J 0 ™"^ Publishing is next described. Fhe 

In response, the selected service is invoked 803 and the fonn ^ based publishing process relates to the 

selected service then proceeds to deliver 804 the service. 40 Potion of voice web business iwhite pages, ye owpages 

During invocation 803 of the selected service, both the and order en '* Pf? 65 " FIG 10 sh ° WS a whlte -y ellow ^er 

service page and the attributes and preferences page asso- sysl f; m u 1000 in w?**™* ™tb , ihe present invention. 

ciated with the service page are extracted by the service ^ 01 + Ce Web busme ^ whlte P a g e * 10 °1 are volce ^b P a S es 

a g ent that are dynamically composed by the voice web business 

^ . , . . n _ A r t , , . , .45 white pages agent 1003 from a business white page database 

Dunng dehvery 804 of the selected service, the service m2 Common including the name, address, phone num- 

agent uses the attributes and preferences page associated ber of businesS6S . ^ white pages agent 1003 presents a 

with the selected 1 sery.ee to customize queries of the asso- ^ form t0 a caUer for s ecif ; the name of the 

ciated service database. More specifically, using the busifless ^ aUows fur[her narrowi of tbe search b cit 

attnbutes and preferences information, the service agent J0 and state Each bllsiness whi , 6 caQ be Unked tQ a 

automatically fills ,n the needed fields in the corresponding corresponding business yeUow page 1004 . Business yellow 

query form with user specified defaults and preferences. 1004 con tain additional information about the busi- 

Havmg filled the appropriate fields the service agent plays ness includin a u advertisement) d i rectionS) wor king 

me remaining query tonn to the subscriber thereby greatly hours> and promotions . In addition> each 

yellow page 1004 

reducing the information that the subscriber has to supply on 5J can be hnked t0 , corresponding business order entry form 

the telephone. The service agent then obtains the remaining 1005. Business order entry forms 1005 allow users to order 

information, tf any, from the subscriber and submits the ducts and services Qr , ransact business b ^ j 

query form to the service database. When the results are duct or service cod preferences> quanlity> / nd ^ 

returned (1* the information is retrieved from the service card num5ers for payment . 

database), the service agent plays the results to the sub- „ A „„. . .. f . ... . ... 

scriber over the telephone. 60 A partoipat.ng business can publish a voice web yellow 

r page 1004 by simply filing a corresponding voice web 

Form Based Voice Web Page Publishing y eIlow P a S e Polishing form 1007. A yellow page publish- 
ing agent 1006 processes the yellow page publishing form 

In another aspect of the invention, voice web system 100 1007 and dynamically generates a business yellow page 

enables pub fishers to compose voice web forms and pages 65 1004 for that business from a standard yellow page template 

statically using ordinary word processing programs and link by replacing variables in the template with values supplied 

them to voice files created using ordinary audio capture and by the submitted yellow page publishing form. 
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The yellow page publishing agent 1006 (a publishing 
agent) presents a yellow page voice web publishing form 
1007 to the participating business. Voice web publishing 
forms are specially designed voice web forms that when 
interpreted (i.e. when played back) using the voice browser 5 
prompt the caller (the voice information publishers) to input 
voice and touch tone based input using a telephone. Yellow 
page publishing form 1007 guides the caller step by step to 
supply the needed information, edit and modify the infor- 
mation and finally submit the information for processing, as 
described in reference to FIG. 9. Specifically, yellow page 10 
publishing form 1007 prompts for voice information includ- 
ing name, tag line, advertisement, directions, working hours 
and promotions. In addition, the yellow page publishing 
agent 1006 prompts for touch tone input including the 
account number, password, phone number, yellow page 15 
category code and credit card number. Yellow page publish- 
ing agent 1006 uses the account number to identify the 
business, the password to verify the business, the phone 
number to link it to the corresponding white page, the yellow 
page category code to classify the business within business 20 
yellow pages, and the credit card number to pay for the 
business yellow page. Once the business is identified and 
verified, yellow page publishing agent 1006 dynamically 
creates a business yellow page 1004 from a standard tem- 
plate for the appropriate category. Yellow page publishing 2 s 
agent 1006 uses the supplied business phone number to 
match with the appropriate database entry in the business 
white pages and updates it with the URL of the newly 
created yellow page to link it, 

A very similar process occurs for publishing order entry 
forms, A business order entry form publishing agent, order 
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page publishing agent 1008 presents an appropriate order 
entry publishing form 1009 to a participating business. 
Order page publishing agent 1008 requests for appropriate 
customized prompts for specific fields in the business order 
entry form such as product or service code, customer 
preferences, quantity, credit card number etc. Order page 
publishing agent 1008 also requests for touch tone input for 
the account number, password, phone number, and credit 
card number. Order page publishing agent 1008 uses the 
account number and password for identification and 
verification, the phone number to link it to the corresponding 
yellow page 1004 and the credit card number for payment 
for the order entry form. Once the business is identified and 
verified, order page publishing agent 1008 dynamically 
generates an order entry form for that business by filling the 
supplied information into a standard order entry template for 
that business category. Order page publishing agent 1008 
uses the supplied business phone number to match with the 
appropriate database entry in the business white pages, 
updates it with the URL of the newly created order entry 
page, locates the corresponding yellow page using its URL 
in the database, and updates it to link to the newly created 
order entry page. 

The foregoing discussion discloses and describes merely 
exemplary embodiments of the present invention. As will be 
understood by those familiar with the art, the invention may 
be embodied in other specific forms without departing from 
the spirit or essential characteristics thereof. Accordingly, 
the disclosure of the present invention is intended to be 
illustrative, but not limiting, of the scope of the invention, 
which is set forth in the following claims. 
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I. HVML Specification 

Hyper Voice Markup Language consists of a set of extensions to existing HTML. Some 
of the extensions are new elements with new tags and attributes. Others are extensions to 
existing elements in the form of new attributes. All attribute values are shown as %value 
type%. 

In-line Voice components 

The primary mechanism for introducing voice prompts into an HTML page is a new 
inline voice HVML element similar to the inline image HTML element. The tag for this 
element is "VOICE" and it has many variations. Each variation is specified by value of 
the TYPE attribute. Depending on the type, each variation has additional attributes. 
Voice Files 

<VOICE TYPE= "File" SRC= "%URL%" TEXT- "%text%"> 

VOICE tag with TYPE set to "File" indicates a file containing pre-recorded voice 

information. It's attributes are SRC and TEXT SRC attribute specifies the URL for the 

voice file and TEXT attribute, which is optional, specifics the text that can be translated 

to speech as an alternative to the voice file. 

Voice Index Files 

<VOICE TYPE- "Index" SRC- "%URL%" INDEX- "%index%" TEXT- "%text%"> 
VOICE tag with TYPE set to "Index" indicates an indexed file containing pre-recorded 
voice phrases. It*s attributes are SRC, INDEX and TEXT. SRC and TEXT have same 
meaning as in Voice Files. The INDEX attribute specifies index of the phrase within the 
file either as a number or a label. 
For example: 

<VOrCE TYPE- "File" SRC-"myweb/home/greettng.wav"> 
Text-to-Speech 

<VOICE TYPE- "Text" TEXT- "%text%"> 

VOICE tag with TYPE set to "Text" indicates a tcxt-to- speech string. It's attribute is 
TEXT which specifies the string that needs to be translated to speech. 
For example: 

<VOICE TYPE- "Text" TEXT="Welcome to your Home Pagc"> 
Voice Streams: 

<VOICE TYPE- "Stream" VALUE- "%URL%" TERMINATE- "%tone%"> 

VOICE tag with TYPE set to "Stream" indicates a continuous voice stream identified by 

its URL The browser accesses the voice stream and continuously plays it to the user. It's 



06/30/2003, EAST Version: 1.04.0000 



5,915,001 



24 



APPENDIX A-continued 



attribute is TERMINATE which specifies the tone the user can enter to terminate the 

playback. 

Currency 

<VOICE TYPE- "Money" VALUE- "%number%" FORMAT- "%format%"> 
VOICE tag with TYPE set to "Money" indicates a number that needs to be presented as 
currency. It's attributes are VALUE and FORMAT VALUE specifies the decimal value 
of the number and FORMAT, which is optional, specifies the currency type such as "US 
Dollar", "British Pound" etc. The default value for FORMAT is "US Dollar". 
Numbers 

<VOICE TYPE- "Number" VALUE- "%number%" FORMAT- "%format%"> 

VOICE tag with TYPE set to "Number" indicates a number that needs to be presented as 

a decimal number. It's attributes are VALUE and FORMAT. VALUE specifies the 

decimal value and FORMAT, which is optional, specifies the precision to be conveyed. 

Digits after the decimal point are pronounced as characters. Default value for the 

FORMAT is 2 which indicates 2 digit precision after decimal point. 

Characters 

< VOICE TYPE- "Character" VALUE= "%string%> 

VOICE tag with TYPE set to "Character" indicates a sequence of characters that are to be 
presented separately with no pauses in between. It's attribute is VALUE which specifies 
the sequence of characters as string. 
Dates 

<VOICE TYPE- "Date" VALUE- "%date%" FORMAT- "%format%"> 

VOICE tag with TYPE set to "Date" indicates an expression that is to be presented as a 

date. It's attributes are VALUE and FORMAT. VALUE attribute specifies the expression 

and the FORMAT attribute, which is optional, specifies the format of the expression. 

Default format is MM/DD/YY. 

Ordinals 

<VOICE TYPE- "Ordinal" VALUE- "%number%"> 

VOICE tag with TYPE set to "Ordinal" indicates a number that is to be presented as an 
ordinal (i.e. as Nth value). It's attribute is VALUE which specifies the number. Values 
are pronounced as "first", "second", "third" etc. 
Strings: 

<VOICESTRING NAME- "%name%"> 
. . . Voice Components . . . 
<^VOlCESTRING> 

VOICESTRING tag indicates a sequence of voice components that are grouped together 
for presentation without any pauses in between. Each of the voice components can be 
any of the primitives previously defined. The voice browser gathers the individual 
components and plays them together in sequence. 
<Voicestring NAME- "welcome"> 

<Voice TYPE- "Index" SRC= "wclcomc.vap" INDEX- "begin" TEXT- "Welcome' > 

<Voice TYPE- "File" SRC- "username.vox" TEXT- "user's name"> 

<\fcice TYPE- "Index" SRC- "welcome.vap" INDEX- "end" TEXT- "to VOIS NET" 

</VoiceString> 

The voice browser "plays" each in-line voice component in sequence as it encounters it in 
the HVML page starling from the beginning of the page. Each voice component is played 
only once for each presentation. A "reload" command would cause the voice browser to 
re-play the page. 

Of course, voice elements can also be invoked by hyper links pointing to voice files 
containing digitized voice data. This is similar to existing HTML conventions. The voice 
browser simply fetches the new page and plays it once. In the next section, we will 
discuss how hyperlinks can be invoked using touch tone or key word input. 
Vfaice responsive labels for hyper-links 

In order to invoke hyper links embedded in a HVML page, two new attributes 'TONE" 
and "LABEL" are added to the anchor element. These attributes are used in conjunction 
with the existing HREF attribute in an anchor element that makes the anchor into a hyper 
link. When the user selects the touch tone signals specified by the value of the TONE 
attribute followed by the tone or utters the word specified by the LABBL attribute, 
the browser invokes the corresponding hyper link. The TONE and LABEL attribute 
values must be unique within a page. 
For example: 

<A HREF-"myweb/home/greeting.vml TONE-"HEUX>"> 
or 

<A HREF="mywcb/home/greeting.vml LABEL="HELLO"> 

When the user presses "H,E,L,L,0,#" on the touch tone phone or the user says the 

word "HELLO" on the phone, the browser will invoke the corresponding hyper link and 

accesses the "greeting. v ml" page. 

Keyword accessible indexes for anchors 

HTML allows the index access of fragments within a page by unique labels associated 
with anchors surrounding the fragment. The NAME attribute in an anchor element 
specifies a label that is unique within the page. This label can then be used as an index by 
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the browser to search for the fragment by matching the unique label with the one supplied 

in the hyperlink. The hyperlink for the indexed fragment uses the regular URL for the 

page concatenated with the fragment's unique label with a "#" separator. 

Coupled with voice responsive hyper links, fragment labels can be used to construct 

simple menus or database searches. 

For example: 

Suppose "myweb/home/prompts.vml" contains the following HVML text. 

<A NAME="promptr'> 

<VOICE TEXT="Press CAL# for Calendar"> 

</A> 

<A NAME«"prompt2"> 

<VOICE TEXT="Press ADDR# for Address Book"> 
</A> 

<A NAME="prompt3"> 

<VOICE TEXT-"Press EMAIL for Electronic Mail"> 
</A> 

Suppose another HVML page contains the following hyperlinks= 

<A KREF="uiy^"eb/^ume/piOmptS.vuilf^)rOnlpLl" TONE="i">Fiess 1 tu hear 

Promptl</A> 

<A HREF="myweb/home/prompls.vml#prompt2" TONE="2">Press 2 to hear 
Prompt2</A> 

<A HREF="myweb/home/prompts.vml#prompt3" TONE="3">Press 3 to hear 
Prompt3</A> 

Then, if the user presses "1,#", the browser will fetch the "myweb/home/prompts.vml" 
HVML page, match "promptl" index with the first anchor's "promptl" label, and start 
presenting the prompts starting with text-to-speech translation of "Press CAL# for 
Calendar". 
Browser Control 

<PAUSE TIMEOUT= "%seconds%" TERMINATE^ "%tone%"> 
In order to let the voice page publisher to control the behavior of the voice browser, 
HVML defines a tag "Pause" with "TIMEOUT" and "TERMINATE" attributes. When 
the browser encounters a PAUSE statement, it pauses until either the amount of time 
specified in the TIMEOUT attribute elapses or the user enters the tone specified in the 
"TERMINATE" attribute. If the values of the TIMEOUT attribute is 0, then the browser 
waits there indefinitely. The default value for TIMEOUT is 1 second. Default value for 
TERMINATE is "#". 
Voice Responsive Forms 

HVML uses the FORM tag to enable user input similar to HTML including the 
METHOD attribute which specifies the way parameters are passed to the server and the 
ACTION attribute which specifies the procedure to be invoked by the server to process 
the form. HVML extends the INPUT tag within forms by introducing VOICEINPUT tag. 
VOICEINPUT takes a TYPE attribute similar to the INPUT tag with three new values 
"voice", "tone" and "review" in addition to the existing "reset" and "submit" values. 
The HVML browser pauses at each VOICEINPUT statement in a HVML form until the 
specified input is supplied or input is terminated before processing the remaining form. 
The VOICEINPUT tag with TYPE value set to "voice" indicates a form that accepts 
voice input. Usually, a voice prompt or text-to-speech segment precedes the 
VOICEINPUT tag alerting the user that input is required and how to terminate input. The 
user is expected to speak and this message is recorded in real-time and supplied to the 
Voice Web server for processing. The VOICEINPUT tag containing "voice" value for the 
TYPE attribute also supports a MAXTTME attribute which specifies the maximum 
recording time for the message and a TERMINATE attribute which specifies the touch 
tone that terminates input. If the MAXTIME attribute is not specified, then the default 
value of "15" is assumed. If TERMINATE attribute is not specified, then the default 
value of "#" is assumed. For example, if the MAXTIME value is 20 and TERMINATE 
value is "#", then recording terminates when the user presses "#" or 20 seconds of time 
elapses. 

The VOICEINPUT tag with TYPE value set to "tone" indicates a form that accepts touch 
tone input. Again, a voice prompt or a text-to-speech segment precedes the 
VOICEINPUT tag alerting the user for input. The user is expected to press a sequence of 
touch tones which are recorded and supplied to the Voice Web server for processing. The 
VOICEINPUT tag containing "tone" value for the TYPE attribute also supports a 
MAXDIGITS attribute which specifies the maximum number of touch tone digits that 
can be supplied and a TERMINATE attribute which specifies the touch tone that 
terminates input. If the MAXDIGITS attribute is not specified, then the default value of 
"20" is assumed. If TERMINATE attribute is not specified, then the default value of "#" 
is assumed. For example, if the MAXDIGITS value is 10 and TERMINATE value is "#", 
then input process terminates when the user presses or 10 digits are supplied. 
The VOICEINPUT tag with TYPE value set to "review" indicates that the current values 
of the form can be reviewed by selecting the "review" input. The VOICEINPUT tag with 
TYPE value set to "reset" indicates that the current values of the form should be reset to 
their original defaults. The VOICEINPUT tag with TYPE value set to "submit" indicates 
that the current form should be submitted to the server. Each of these three TYPE values 
support a SELECTTONES attribute and a SKIPTONES attribute. SELECTTONES 
attribute specifies the sequence of touch tones that activates the corresponding selection. 
SKIPTONES attribute specifies the sequence of touch tones that skips the selection. If the 
SELECTTONES attribute is not specified, then the default value of is assumed and 
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if the SKIPTONES attribute is not specified, then the default value of "*" is assumed. 
For example, if the SELECTTONES attribute value is "REVIEW" and SKIPTONES 
attribute value is "SKIP" for a VOICEINPUT element with TYPE value set to "review", 
the user can enter "REVIEW" to review the form values or enter "SKIP" to skip the 
selection. VOICEINPUT tag with TYPE value set to "submit" similarly indicates the 
values of the form can be submitted to the server. If the SELECTTONES attribute value 
is "DONE" and the SKIPTONES attribute value is "**", the user can either enter 
"DONE" to submit the form or press "**" to skip the selection. VOICEINPUT tag with 
TYPE value set to "reset" similarly indicates that the values of the form be reset to their 
original values. 

II. Voice Browser Commands 



All browser commands must start with the key. Each browser command is associated 
with one or more key words that uniquely identify it For example, in order to activate 
"Home" command, the user would press "*home" on the telephone key pad. The key 
words are chosen in such a way to generate unique dial tone sequences. A set of default 
browser commands arc listed below with the keyword and description of the command. 
Alternatively, the browser commands can also be issued by vocalizing the corresponding 
commands. For example, to activate the "Home" command, the user would say "home" 
on the telephone. 
Previous 

Jump to the previous page from which the current page was accessed via a hyper 
link. This command is activated by pressing "*pr" (*77) or "*prev" (*7738) 
sequence. 
Next 

Jump to the next page in a sequence of hyper links. This command is activated by 

pressing "*n" (*6) or "next" (*6398) sequence. 

liistory 

Present the titles of the pages accessed so far in the order of their hyper link 
access sequence. Pause after each title. If the user presses "#"", then jump to the 
page specified by the title. If not, proceed to the next title. This command is 
activated by pressing "*hi" (*44) or "*hist" (4478) sequence. 
Home 

Jump to the first page in the sequence of hyper links. This command is activated 

by pressing "*ho" (*46) or "*home" (*4663) sequence. 

Reload 

Reload the current page again from the Web server. This command is activated by 

pressing "*re" (*73) or "*relo" *(7356) sequence. 

Help 

Jump to the home page of the help page set. Help pages are navigated in exactly 
the same way as ordinary HVML pages. However, a new browser instance is, 
created on activation which must be "exited" to get back to the page context from 
which "Help" page set was accessed. This command is activated by pressing "*h" 
(*4) or ""help" (*4357) sequence. 
Fax 

Jump to the home page of the Fax dialog session using HTML forms. Again, a 
new browser instance is created on activation which must be "exited" to get back 
to the page context from which "Fax" dialog session was activated. This 
command is activated by pressing "*fa" (*32) "*fax" (*329) sequence. 
Stop 

Stop loading the page that is currently being accessed. This command is activated 

by pressing "*t" (*8) or "*stop" (*7867) sequence. 

Exit 

Exit the current instance of the browser and return to the page being accessed in 
the previous instance of the browser. If this is the first instance of the browser, 
then exit the browser and hang-up the phone. This command is activated by 
pressing "*x" (*9) or "*exif (*3948) sequence. 
Bookmarks 

Present the titles of the pages selected as bookmarks in the order of their hyper 
link access sequence. Pause after each title. If the user presses "#", then jump to 
the page specified by the title. If not, proceed to the next title. This command is 
activated by pressing "*bo" (*26) or "*book" (*2665) sequence. 

III. Voice Browser Playback Controls 

When the M>icc browser is activated to play back voice prompts or speech segments, an 
additional set of browser commands are available to the user to control the playback. 
Pause 

Pause the play back at current position. This command is activated by pressing 
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"*p" (*7) or "*pausc" (*72873). 
Play 

Continue play back from current position. This command is activated by pressing 

—p" (*7) or "*play" (*7529). 

Backup 

Back up the play back position by 5 seconds and start play back. The command is 
activated by pressing "*b" (*2) or "*back" (*2225). Repeated pressing of the 
same tone implies successive back up by 5 seconds for each tone. 
Forward 

Forward the play back position by 5 seconds and start play back. The command is 
activated by pressing "*f" (*3) or "*frwd" (*3793). Repeated pressing of the same 
tone implies successive skip forward by 5 seconds for each tone. 
Start 

Back up the play hack position to the beginning of the play back sequence and 

start play back. The command is activated by pressing ""0". 

End 

Jump to the end of the play back sequence, backup by 5 seconds and start play 
back. The command is activated by pressing "*3". 



What is claimed is: 

1. A method of delivering caller-customized voice-based 25 
information to a caller, comprising: 

storing caller-specific information in a computer file at a 
universal resource locator (URL): 
determining a URL associated with the caller; 
retrieving the caller-specific information using the 30 
URL; 

processing at least one caller command received over 
the telephone to determine a service request; 

retrieving information responsive to the service request 
and responsive to the caller-specific information, 35 
including; 

generating a database query form responsive to the 

service request; 
customizing the database query form using the 

caller-specific information; and 
performing a database search using the query form, 40 
wherein generating a database query form respon- 
sive to the service request includes: 
storing a voice form associated with the service 
request at a universal resource locator (URL) 
address in the computer network wherein the 45 
voice form is stored in a markup language; 
playing the voice form to the caller to generate at 

least one information prompt for the caller; 
collecting information from the caller in response 

to each prompt; and 50 
generating a database query form using at least a 

portion of the collected information; and 
playing back the retrieved information to the 
caller over the telephone. 

2. The method of claim 1 wherein collecting information 55 
from the caller in response to each prompt includes collect- 
ing touch tone inputs from the caller. 

3. The method of claim 1 wherein collecting information 
from the caller in response to each prompt includes collect- 
ing voice command inputs from the caller and performing 
speech recognition on the voice command inputs. 60 

4. A method of processing voice-based information 
received from a telephone caller over a computer network, 
comprising: 

storing a voice form at a universal resource locator (URL) 
address in the computer network wherein the voice 65 
form is stored in a markup language with voice exten- 
sions; and 



during a calling session: 

playing the voice form to the caller to generate at least 

one information prompt to the caller; 
collecting information from the caller in response to 

each prompt; and 
storing the collected information in a first markup 

language document and including in the document a 

hyperlink to a second markup language document. 

5. The method of claim 4 wherein the hyperlink is 
determined responsive to at least a portion of the collected 
information. 

6. A method of processing voice-based information 
received from a telephone caller over a computer network, 
comprising; 

storing a voice form at a universal resource locator (URL) 
address in the computer network wherein the voice 
form is stored in a markup language with voice exten- 
sions; and 
during a calling session: 

playing the voice form to the caller to generate at least 

one information prompt for the caller; 
collecting information from the caller in response to 

each prompt; and 
storing the collected information in a first markup 

language document and including in a second 

markup language document a hyperlink to the first 

markup language document. 

7. The method of claim 6 wherein the hyperlink is 
determined responsive to at least a portion of the collected 
information, 

8. A system for delivering information over a telephone, 
comprising: 

a business white pages database including business name, 

address and phone number information; 
a database query form; 
a first processing agent programmed to: 

collect user information using a voice based telecom- 
munications device; 
include at least some of the collected information to the 

database query form; 
search the database by applying the database query 

form to the database to retrieve information; and 
generate a voice web page having a universal resource 
locator (URL) address using the retrieved informa- 
tion; 
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a yellow page database including business advertising 
information; and 

a second processing agent wherein the voice web page 
generated by the first processing agent includes a 
hyperlink to the second processing agent and wherein 
the second processing agent is programmed to: 
search the yellow page database to retrieve informa- 
tion; and 

generate a voice web page using the retrieved infor- 
mation; and 

a voice web browser adapted to play voice web pages 
to a user. 

9. The system of claim 8 wherein the hyperlink identifies 
an entry in the yellow page database and wherein searching 
the yellow page database comprises locating the yellow page 
database entry identified by the hyperlink. 
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10. The system of claim 8 further comprising: 

an order page database including business order informa- 
tion; and 

a third processing agent wherein the voice web page 
5 generated by the second processing agent includes a 
second hyperlink to the third processing agent and 
wherein the third processing agent is programmed to: 
search the order page database to retrieve information; 
and 

30 generate a voice web page using the retrieved infor- 
mation. 

11. The system of claim 10 wherein the second hyperlink 
identifies an entry in the order page database and wherein 
searching the order page database comprises locating the 

15 order page database entry identified by the hyperlink. 
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